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Abstract 

We derive an asymptotic power function for a likelihood-based test for inter- 
action in a regression model, with possibly misspecified alternative distribution. 
This allows a general investigation of types of interactions which are poorly or well 
detected via data. Principally we contrast pairwise-interaction models with 'diffuse 
interaction models' as introduced in Gustafson, Kazi, and Levy (2005). 
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1 Introduction 



There has been much discussion about how to define and measure interaction. The 
interaction of two or more covariates can be measured as the difference between the 
joint effect of covariates and the sum of their independent effects, or, in other words, the 
departure away from an additive model. In this paper, we focus on the power of model- 
based tests for the presence of interaction, under misspecified models. That is, with one 
kind of interaction model truly generating the data, another kind of interaction model 
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is applied for estimation and testing purposes. Our rationale for this is a suspicion that 
sometimes a wrong but parsimonious model for interactions may lead to better power 
for detecting departures from additivity than a complex model for interactions, even if 
the complex model is correct. To make comparisons tractable, we derive the asymptotic 
power of the test statistic under a sequence of Pitman-type alternatives, which are getting 
closer to the null (additive) model as the sample size increases. 

In Section 2, under a general framework, we give the asymptotic power function based 
on a misspecified model. The corresponding function for a correct model arises as a special 
case. In Section 3, we apply the general results to a particular comparison between a 
pairwise interaction model (PIM) and a diffuse interaction model (DIM). The latter was 
proposed by Gustafson, Kazi and Levy (2005) as a parsimonious model for interactions 
appropriate for reflecting a general synergism or antagonism in how covariates interact, 
without identifying particular pairs of variables responsible. We find that when the DIM 
is correct, the DIM-based test for interaction is more powerful than the PIM-based test, at 
least in all the specific scenarios we have considered. When, the PIM is correct, however, 
the comparison is mixed. Depending on the specific nature of the pairwise interactions, 
in terms of directions and relative magnitudes of coefficients, either the DIM-based test 
or the PIM-based test may be more powerful. 

2 General framework 

In this section, we give a general result about the asymptotic power function of a Wald 
(quadratic form) test for the presence of interactions, in the context of a misspecified 
model for the alternative distribution. In fact, the mathematical formalism is more 
general, in terms of describing an arbitrary testing scenario with model misspecification. 
Let T = {f(y\ x, 0) : # G 0} and Q = {g(y\ x, G fi} denote two different 
parametric families of densities for modelling (Y \Xi, . . . , X p ), with p 1 = dim(0) and 
p 2 = dim(uj). We consider fitting model T to a sample of size n, and testing the null 
hypothesis CO = £ against a non-directional alternative, where C is an r x p 1 matrix of 
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full row rank. Conversely, the true data-generating mechanism is taken to be a member of 
Q. To form a sequence of Pitman-type alternatives (Le Cam 1960), the specific member 
of Q generating the data is taken to be uj n = u> + n^^Arj, where A is a scalar and 77 is 
a vector of unit-length. It is assumed that g(y\x.,ujo) = /(y|x, 60) for some 6q £ Q with 
COq = £ . Thus the extent of model misspecification and the extent of deviation from 
the null both diminish with n. 

For the two parametric families, let 

s F (0,Y,X) = d[\og{f(Y\X,e)}]/dO 

and 

s G (w,y,X) =d[\og{g(Y\ X,w)}]/au> 
be the respective score vectors, and 

I F (0) =E e {s F (0,Y,X)s T F (0,Y,X)} 

and 

J G (w) = E ^{sg^, Y, X)4(w, F, X)} 

be the respective Fisher information matrices. Note that here, and in what follows, 
expectations are with respect to the same fixed distribution of X, and the distribution of 
(F|X) based on a member of T or Q, as indicated by a subscript. Let 6 n be the maximum 
likelihood estimator based on fitting T to n observations arising from g(y\ x, u> n ). Then 

W = n(C9 n - C ) T {c7^(0 n )C T } _1 (C9 n - C ) 

is a Wald test statistic, which would be asymptotically distributed as Xr with r=rank(C) 
degrees of freedom if the data were generated under the null (i.e., via some member of 
T with CO = Co)- However, with data generated as g(y\ x, u) n ), we have 

n^fce- Co) = n l ' 2 c{e n -eo) 

= n^ 2 C (d n - 0,(w n )) + n l / 2 C (0,(w n ) - O ) , (1) 



3 



where 0*(w) is the parameter vector which minimizes the Kullback-Leibler information 



criterion, that is 



0*(u>) = argmin^ \ lo. 



9(Y\X,u>) 



(2) 



f(Y\X,0) 

Note that the fact g(- \ uj ) = f(-\ O ) yields that 0*(cl> o ) = O . 

By White (1982) we know that the first item on the right side of ([T]) is asymptotically 
normal with mean and covariance matrix CI f 1 {0q)C t . So we only need to work on 
the second item. By (T5]) we know that 0*(cj) satisfies 



£ w { SF (0M,y,x)} = o. 

Based on Gustafson (2001), implicit differentiation of (j3J) gives 

(90 

E w [s' F (0*(u); F, X)]— *- + E„[s F (0*(u), Y, X)4(w; F, X)] = 0. 

Evaluated at u) = ojq, the above equality yields 
80, 



(3) 



I F \0 o )E eo {s F (0 o ; Y, X)4(w ; F, X)}, 



(4) 



which is derived by the fact that 0*(w o ) = 0o- Therefore, we have 

89, 



n 1 / 2 c{0,K)-0 o } = n 1/2 c 



AC 



doj 
80, 
doj 



Ar)n- 1/2 + 0(n~ v 



77. 



(5) 



Based on O'Brien et. al. (2006), we apply (J5]) in ([T]). Hence we have the asymptotic 
distribution of W as noncentral Xr($)> where the noncentrality parameter 5 is given by 

T 



AC 



80, 



77} {C/ f 1 (0 o )C t }~ 1 |aC 



gg. 
doj 



T] 



U)=U) 



= A 2 r/ T E 0o { SG (u;o; F, X)4(0 O ; Y, X)}I F \9 )C T {CI F \9 Q )C T } 1 C 

/ F 1 (0 O )E 0() { SF (0 O ; Y, X)s' g (u o] Y, X)}ry. (6) 

In the case of a correctly specified model, i.e., T = Q, this reduces to 

5 = AWC T {CI F \0 Q )C T Y 1 Cr ] . 
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In either case, the asymptotic power of test statistic W is 

P(Xr(5)>Xr, a ), 

where Xr a * s the upper a quantile of Xr- 

3 Comparison Between Pairwise Interaction Models 
and Diffuse Interaction Models 

3.1 Diffuse Interaction Model 

Greenland (1983) pointed out that the power of statistical tests to detect interactions is 
very low in some commonly encountered epidemiological situations. We certainly envision 
low power in situations where the number of covariates is rather large and only a very 
small fraction of all possible (pairwise) interaction terms really play a role. Gustafson et. 
al. (2005) proposed another kind of interaction model, the diffuse interaction model, to 
deal with difficulties caused by a large number of covariates under pairwise interaction 
models. The basic form of this model is best understood in the context of known effect 
directions. For instance, say Y represents a health outcome (larger values worse), and 
each Xi is a risk factor, scaled to be nonnegative, such that E(Y\X) is known a priori 
to be non- decreasing in each X^. Then the DIM form is 




with (3i > for i = 1, . . . ,p. Note that if Xj = can be interpreted as 'absence' of the 
j-th risk factor, then f3j can be interpreted as the effect of Xj when all other risk factors 
are absent, regardless of the value of A. Assuming normal, homoscedastic errors with 
cr 2 = Var(Y |X), (/3, A, cr 2 ) comprise the p + 3 unknown parameters in the DIM. 

To interpret A, note first that when A = 1, (JTj) reduces to the usual additive model. 
If A > 1 though, then the interaction is antagonistic, in the sense that for a < b, 
E(Y\Xj = b, X(j) = X(j)) — E(F|X,- = a, Xyj = x^)) is positive, but decreasing in 
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each component of X(j) = (xi, . . . , Xj-i, Xj+i, . . . , x p ). In the special case where each Xj 
is binary to indicate absence or presence of a risk factor, A > 1 corresponds to the effect 
of a particular risk factor diminishing as other risk factors become present. Conversely, 
A < 1 corresponds to synergism with the effect growing as other risk factors become 
present. Thus the DIM (J7J) allows for a general tendency for antagonism or synergism 
in how multiple risk factors operate on the outcome, without attempting to model fine 
structure of how such antagonism or synergism arises. That is, A controls a one-parameter 
extension of an additive model which does not single out any particular subset of risk 
factors as being less or more responsible for interaction. 

3.2 Power Comparison between PIM-based and DIM-based Tests 

The standard strategy for modelling interactions involves a pairwise-interaction model 
(PIM). Again assuming normal, homoscedastic errors, 

E(Y\X U ...,X P ) =(3 + Y: P i=1 PiX i + j: i<jlij X i X j , (8) 

comprises a model with p(p + l)/2 + 2 parameters, (/3, 7, a 2 ). 

To test for departures from additivity then, we could fit the DIM and test the null 
that A = 1, or fit the PIM and test the null that 7 = 0. In either case, the same null 
model arises, i.e., the additive model with p + 2 unknown parameters (J3, a 2 ). 

We can specify a distribution for X, values for (/3, a 2 ), and a choice of true alternative 
(DIM or PIM), and then compute the asymptotic power for both the DIM-based test 
and the PIM-based test. When the true alternative is based on DIM, we simply have 
A = l + n 1 / 2 Ar7 (since only a single parameter A describes the departure from additivity). 
When the true alternative is based on PIM, we must specify the p(p — l)/2 elements of 
the unit- vector 77, i.e., we must specify how the pairwise-interaction coefficients deviate 
from zero. Thus investigating the power to detect interactions of PIM form is necessarily 
more involved than in the DIM case. 

Note that in all cases the quantities needed to determine the asymptotic power are 



6 



expectations of squares and cross products of score vectors for the two models. Some 
calculations lack a closed-form due to the particular form of the score vector for the 
DIM. The components of this score vector are given in Appendix. It is the element 
corresponding to the partial derivative with respect to A (evaluated at A = 1) that causes 
the difficulty in obtaining an analytical form. At least in situations where the distribution 
of X is discrete, all expectations required can be calculated via analytic expectation (for 
Y |X) and finite summation for all possible values of X. More generally, if X follows some 
continuous distribution, the numerical integration is required. 

3.3 Detecting Interactions of DIM Form 

As one particular example, say that p = 9 covariates are independent and identically 
distributed as Bernoulli(0.5). Say that j3' = (0,0.5 x 1 9 ) and a 2 = 1. Asymptotic power 
curves (power as a function of A) for the DIM and PIM tests, when the true alternative 
is DIM, appear in Figure [H As might be anticipated, the DIM test has substantially 
higher power, i.e., one does better if one models the alternative hypothesis correctly. 

We find that this conclusion is maintained as we vary the number and distribution 
of covariates, and the values of (3 and a. To some extent we can see this analytically. 
For instance, changing a has the same effect of considering a different value of A, as the 
noncentrality parameter in is proportional to A 2 /a 2 . 

3.4 Detecting Interactions of PIM Form 

To investigate power when the true alternative follows the PIM, for now we keep the same 
distribution of X and choice of (f3, a 2 ) as before, but must consider different possibilities 
for r/, the direction in which the pairwise-interaction coefficients deviate from zero. In 
an attempt to be somewhat comprehensive, we set up three primary factors as follows. 
Factor 1 is the proportion of entries in r) which are non-zero. Of the non-zero entries, 
Factor 2 is the proportion which are positive. Specializing to the case that all positive 
entries share the same magnitude and all negative entries share the same magnitude, 
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Factor 3 is the ratio of the (unique) magnitude of the positive entries to the (unique) 
magnitude of the negative entries. Under this specialization, and given the restriction 
\\r]\\ = 1, specifying the three factors does yield a specific value of 77, up to permutation. 
Here the permutation we choose is (0, . . . , 0, 1, . . . , 1, —1, . . . , —1). 

We consider three levels for each factor, particularly (0.2,0.5,0.8) for Factor 1 and 2, 
and (0.5,1,2) for Factor 3. In the following plots (Figures 2 through 4), we refer to the 
three levels of each Factor as low, medium, high respectively. 

Note first that if the level of Factor 1 is not low and the levels of Factors 2 and 3 
are both high (or both low), DIM is be more powerful than PIM. This point is indicated 
by the superiority of DIM-based power curves in the two bottom left (right) panels in 
Figure 2 (Figure 4). In other words, given a moderate to high presence of pairwise- 
interaction terms, if the proportion of the positive (or negative) pairwise terms with 
larger magnitudes overwhelms negative (or positive) ones, i.e., the "overall" interaction 
strength leans to synergism (or antagonism), then DIM works better. However, if the 
proportions of two opposite directions are almost equal and the magnitudes of two signs 
are almost equal as well, DIM does not work well, as shown in the middle columns of 
Figures 2 through 4. 

Note also that the first column and third column are actually identical in Figure 
[3j This is caused by the asymptotic power being an even function of A, which can be 
immediately shown by (Q. Note then the primary factors of (0.2,0.8,1) with A > and 
the primary factors of (0.8,0.2,1) with A < give the same value of 77. 

While Figures 2 through 4 compare the DIM and PIM-based tests across different 
settings of the primary factors, there are of course numerous secondary factors which 
might be varied as well. These include the number of covariates and their joint distri- 
bution, as well as the values of the main effect coefficients. As one example, Figures 5 
through 7 make comparisons as per Figures 2 through 4, but with the number of co- 
variates doubled (p = 18 now). Here we see more of a tendency for the DIM-based test 
to compare favourably with the PIM-based test, as we might expect. R code is avail- 
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able (www.stat.ubc.ca/~gustaf) to carry out the power comparison with primary and 
secondary factors set to levels desired by the user. 

4 Discussion 

We view our findings as lending some general support for the utility of models, such as 
the DIM, which compromise between the simplicity of additivity and the flexibility of the 
PIM. To elaborate, we do not claim that the DIM will be highly realistic across a large 
range of problems, particularly in the restricted form considered here (all effect directions 
known, interactive behaviour either completely synergistic or completely antagonistic). 
Indeed, Gustafson et. al. (2005) extend the DIM to unknown directions and Liu (2007) 
considers more general DIM forms whereby one group of covariates may have different 
interactive behaviour than another. Even in the simple form presented here, however, the 
DIM can capture some coarse structure of the regression relationship beyond additivity 
(i.e., a general tendency for synergistic or antagonistic combination of risk factors). In 
contrast, inference in the PIM might be viewed as attempting to recover fine structure 
of nonadditive behaviour. The asymptotic power comparison of PIM and DIM-based 
tests is therefore a convenient way to quantify the extent to which coarse features of non- 
additivity are more easily detected than fine features. It seems interesting that under 
a true PIM-structure, enough cohesion in the direction of the pairwise-term coefficients 
can render the DIM-based test of non-additivity more powerful than the PIM-based 
test. This matches the applied statistics intuition that often data will not inform very 
much about the nature of nonadditivity, hence a coarse descriptor, such as the single 
nonadditivity parameter in the simple DIM, may be appropriate. 

One way in which our stylized treatment of the problem differs from applied practice 
is that we have considered the PIM-based test comparing the additive model with no 
interactions to the full model with all possible pairwise interactions. In practice, particu- 
larly when p is large, one might use a stepwise procedure which potentially seeks a model 
with a few pairwise interactions. Or one might fit the full model and retain only those 
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pairwise terms with significant coefficients. In either case, multiple comparison issues are 
at play, and comparison with the DIM-based approach would require a different strategy 
than that employed in the present paper. 
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Appendix 



Let f D denote the density function of Y\X. and = (f3' , A, a 2 )' . Then the score vector 
for DIM, s D (0; Y, X), is given by 

d\ogf D 



s D (0;Y,X)\ x=1 = 



where 
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A=l 

d\x dfj, 



dfj, d/j,\ d\ogf D 



''d/3'd\J' da 2 



A=l 



dfj, 

Wo 
dfj 

Wj 

dfj 
d\ 

d\ogfp 
da 2 



A=l 



A=l 



A=l 



A=l 



A=l 



00 + P1X1 + ...+ [3 p X p , 
1, 

= l,...,p, 



- Y^frXt log ^PiXi + PiXilogtfiXi), 



,i=i 



2a 4 



(2/-MU=i) 2 - 



. i=l 
1 

2^2- 



i=l 
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Figure 1: Asymptotic power for a DIM alternative: X u i = 1, . . . ,9 ~ Bernoulli (0.5); 
the solid line denotes power of the DIM-based test; the dashed line denotes power of the 
PIM-based test. 
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Figure 2: Asymptotic power for PIM alternatives, with different choices of r\ and 9 
binary covariates: The three rows corresponds to three levels of Factor 1, the columns 
corresponds to the levels of Factor 2 and Factor 3 is set to be 0.5. Solid lines denote 
power curves based on diffuse interaction model fitting and dashed lines denote power 
curves based on pairwise interaction model fitting. 
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Figure 3: Asymptotic power for PIM alternatives, with different choices of r\ and 9 
binary covariates: The three rows corresponds to three levels of Factor 1, the columns 
corresponds to the levels of Factor 2 and Factor 3 is set to be 1. Solid lines denote power 
curves based on diffuse interaction model fitting and dashed lines denote power curves 
based on pairwise interaction model fitting. 
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Figure 4: Asymptotic power for PIM alternatives, with different choices of 77 and 9 
binary covariates: The three rows corresponds to three levels of Factor 1, the columns 
corresponds to the levels of Factor 2 and Factor 3 is set to be 2. Solid lines denote power 
curves based on diffuse interaction model fitting and dashed lines denote power curves 
based on pairwise interaction model fitting. 
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Figure 5: Asymptotic power for PIM alternatives, with different choices of t] and 18 
binary covariates: The three rows corresponds to three levels of Factor 1, the columns 
corresponds to the levels of Factor 2 and Factor 3 is set to be 0.5. Solid lines denote 
power curves based on diffuse interaction model fitting and dashed lines denote power 
curves based on pairwise interaction model fitting. 
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Figure 6: Asymptotic power for PIM alternatives, with different choices of rj and 18 
binary covariates: The three rows corresponds to three levels of Factor 1, the columns 
corresponds to the levels of Factor 2 and Factor 3 is set to be 1. Solid lines denote power 
curves based on diffuse interaction model fitting and dashed lines denote power curves 
based on pairwise interaction model fitting. 
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Figure 7: Asymptotic power for PIM alternatives, with different choices of rj and 18 
binary covariates: The three rows corresponds to three levels of Factor 1, the columns 
corresponds to the levels of Factor 2 and Factor 3 is set to be 2. Solid lines denote power 
curves based on diffuse interaction model fitting and dashed lines denote power curves 
based on pairwise interaction model fitting. 
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